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W BACKGROUND OF THE INVENTION 

p The present invention relates generally to communications 

systems, and more specifically to a technique for managing a 
multiplicity of time-based queues at an output port of a node in 
a communications network. 

A conventional communications system includes a plurality of 
nodes interconnected by a plurality of data transmission paths to 
form at least one communications network. The plurality of nodes 
includes at least one node configurable as an ingress node for 
originating a data path and at least one node configurable as an 
egress node for terminating a data path through the network. Each 
node on the network typically comprises a network switch that can 
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be used to interconnect two or more of the plurality of data 
paths. Each switch includes at least one input port and at least 
one output port coupled to respective data paths and is typically 
configured to allow each output port to receive digital data in 
5 the form of, e.g., packets from any input port. The switch 
determines the appropriate output port for a particular packet by 
accessing information contained in a header field of the packet. 

In the conventional communications system, a Class of 
Services (CoS) contract is typically formed between an operator of 
10 the communications network and a user of the network that 
specifies the user's parameters for transmitting data over the 
network. For example, the CoS contract may specify the user's 
bandwidth for transmitting packets over the network. Further, 
g because each output port of a network switch may receive packets 
gl5 from any input port of the switch, each output port typically 
|j includes one or more queues configured to buffer at least one 
y user's packet flow for a particular class of service. The switch 
? typically determines the required class of service for each packet 

p in the flow by accessing information contained in the packet 
W20 header field. 

□ The network switch typically employs a scheduling algorithm 

ft for determining the order in which the packets are to be 
transmitted from the output port queue (s) . For example, each 
output port may comprise a respective time-sorted queue for each 
25 packet flow. Further, the switch may employ a Weighted-Fair 
Queuing (WFQ) scheduling algorithm operative to determine the 
order in which the packets are to be transmitted from the time- 
sorted queues. The WFQ scheduling algorithm may compute a 
timestamp having a value corresponding to some virtual or actual 
30 time for the packet at the head of each queue. Next, the WFQ 
scheduling algorithm may determine which head-of-line packet has 
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the timestamp with the lowest value, and then select the 
corresponding queue as the next queue from which to transmit a 
packet. The WFQ scheduling algorithm allows the switch to set 
parameters to guarantee the particular class of service for each 
packet flow. 

The network switch may alternatively employ a scheduling 
algorithm^ based on a binary tree of comparators to determine the 
next packet to be transmitted from the output port queue (s) . 
However, like the WFQ scheduling algorithm, the scheduling 
algorithm based on the binary tree of comparators can typically 
only be used to manage a limited number of packet flows. For 
example, a binary tree of N-l comparators may have log 2 N levels, 
in which N is the number of queued flows. Such a tree can 
become very large and costly to implement as the number of 
packet flows increases. Further, the time required by such a 
binary tree of comparators is typically proportional to log 2 N or 
worse, which makes this approach unacceptable as N gets large. 

Typically, each output port of the switch has a large 
aggregate "bandwidth" (i.e., the capacity to transfer data per 
unit of time) and each output line card implements encapsulating 
the packets into one or more of the various logical and physical 
data transmission types (e.g., SONET OC-48 POS, SONET OC-3 POS, 
DS-3, Gigabit Ethernet, SONET OC-4 8 ATM, etc.). Each output 
line card may be configured to have a multiplicity of outputs 
whose aggregate capacity is equal to the capacity of the switch 
output coming into the card. For example, if the capacity of 
the switch output is approximately 2.4 x 10 9 bits/sec 
(approximately that of SONET OC-48), then one configuration of 
an output line card may have four (4) ports of 600 x 10 6 bits/sec 
and another configuration may have sixteen (16) ports of 150 x 
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10 6 bits/sec. Both configurations would make good economic use 
of the switch output. 

Because a number of types of output line cards may be 
designed, it would be desirable to have as much of the design as 
5 possible in common. Specifically, it would be desirable to have 
the implementation of the WFQ scheduling algorithm be software 
configurable to handle any combination of packet encapsulation 
and physical layer types. Further, the enqueuing and dequeuing 
of packets into the time-sorted queues should be done at a fast 
10 enough rate for transferring minimum size packets at full output 
line data rate capacity using current technology. 

* BRIEF SUMMARY OF THE INVENTION 

Q 

g In accordance with the present invention, a technique for 

JJ15 scheduling the transmission of packets from one or more output 
^ port queues of a network switch is disclosed that can handle a 
j large number of packet flows. Benefits of the presently disclosed 
scheduling technique are achieved by providing a memory at each 
1 output port of the network switch, the memory comprising at least 
|20 one time-based queue, generating one or more acceleration bit- 
3 strings for use in identifying the packet in the time-based queue 
having an associated timestamp with the lowest value, and 
scheduling that packet as the next packet to be transmitted over 
the network. A single time-based queue can buffer packets 
corresponding to one or more packet flows associated with a single 
channel in the network. Alternatively, the memory can be divided 
into a plurality of time-based queues to manage the transmission 
of packet flows associated with a corresponding plurality of 
channels in the network. 

In one embodiment, the scheduling technique includes 
receiving a plurality of packets from one or more packet flows at 
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a respective time-based output port queue of the network switch, 
in which each packet has a timestamp associated therewith. Next, 
each packet is inserted into a respective times lot of the output 
port queue, as indexed by its associated timestamp. The binary 
5 value of the timestamp is then partitioned into a plurality of 
sub-fields, each sub-field comprising one or more bits and 
corresponding to a predetermined level of acceleration bit- 
strings. The sub-fields of bits are used to index respective 
locations in at least one memory, and the values at these 
10 respective locations are subsequently asserted to generate the 
acceleration bit-strings. Specifically, the value at a respective 
location in a first memory configured to store a first level of 
y ( acceleration bit-strings is asserted, as indexed by a first sub- 

y field of bits; the value at a respective location in a second 

O 

fjll5 memory configured to store a second level of acceleration bit- 

31 

JfJ strings is asserted, as indexed by a combination of the first sub- 

Q field and a second sub-field of bits; the value at a respective 

s location in a third memory configured to store a third level of 

Q acceleration bit-strings is asserted, as indexed by a combination 

fUzo of the first and second sub-fields and a third sub-field of bits, 

M 

p and so on for each level of acceleration bit-strings. 

' y In order to dequeue a packet, priority encoding is then 

successively performed for each level of acceleration bit-strings 
to determine the respective timeslot of the time-based queue 
25 containing the packet with the lowest-valued timestamp. To that 
end, priority encoding is performed on the first level 
acceleration bit-string stored in the first memory to obtain a 
first level priority-encoded acceleration bit-string, priority 
encoding is performed on the second level acceleration bit-string 
30 stored in the second memory to obtain a second level priority- 
encoded acceleration bit-string, priority encoding is performed on 




the third level acceleration bit-string stored in the third memory 
to obtain a third level priority-encoded acceleration bit-string, 
and so on for each level of acceleration bit-strings. During the 
above-mentioned priority encoding, each stage' s memory is indexed 
5 by a concatenation of the prior stages' priority-encoded outputs. 
Next, the first, second, and third level priority-encoded 
acceleration bit-strings are concatenated and used to index the 
output port queue to identify the packet in the queue having the 
timestamp with the lowest value. The identified packet is then 
10 extracted from the output port queue and transmitted over the 
network. 

In the presently disclosed embodiment, each output line card 
^ includes a memory having a size that is sufficient to support up 
g to the total bandwidth of the network switch, which may receive 
UJ15 packets from a plurality of flows conforming to different 
i bandwidth requirements. The output card memory can be divided 
Q into a plurality of time-based queues, in which the number of 
queues corresponds to the number of channels handled by the card, 
U each channel having one or more flows associated therewith. 
nj20 Further, the size of each queue is proportional to the fractional 
p amount of the total bandwidth of the card used by the 
Pj corresponding channel. The presently disclosed technique can be 

employed to manage the insertion and extraction of packets into 
and out of the respective queues. By dividing the output card 
25 memory into a plurality of time-based queues to manage the 
transmission of packet flows associated with a plurality of 
channels in the network, memory requirements of the network switch 
are reduced. 

Other features, functions, and aspects of the invention will 
30 be evident to those of ordinary skill in the art from the 
Detailed Description of the Invention that follows. 
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BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 
The invention will be more fully understood with reference 
to the following Detailed Description of the Invention in 
5 conjunction with the drawings of which: 

Fig. 1 depicts a block diagram of a communications system 
including at least one node configured according to the present 
invention; 

Fig. 2 depicts a block diagram of an illustrative 
10 embodiment of a network switch constituting the node of Fig. 1; 

Fig. 3 depicts a conceptual representation of a time-based 
queue included in an output line card of the switch of Fig. 2; 

Fig. 4 depicts a block diagram of an illustrative 
embodiment of the output line card of Fig. 3 including the time- 
15 based queue, a queue controller, and a first memory for storing 
acceleration bit-strings; 

Fig. 5 depicts a block diagram of the queue controller and 
acceleration bit-string memory of Fig. 4 performing priority 
encoding; 

!0 Fig. 6a-6c are diagrams depicting a technique for 

preventing timestamp uncertainty across a plurality of 
consecutive time intervals; 

Fig. 7 depicts a second memory of the output line card of 
Fig. 3 divided into a plurality of time-based queues; 

15 Fig. 8 depicts a pseudo code representation of a method of 

inserting at least one packet descriptor into a respective 
timeslot of the time-based queue of Fig. 4 and generating a 
plurality of acceleration bit-strings corresponding thereto for 
storage in the acceleration bit-string memory of Fig. 4; and 

<0 Fig. 9 depicts a pseudo code representation of a method of 

priority encoding the plurality of acceleration bit-strings of 
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Fig. 8 and using the priority-encoded bit-strings to extract the 
packet descriptor of Fig. 8 having the lowest-valued timestamp 
for subsequent transmission over a network. 



5 DETAILED DESCRIPTION OF THE INVENTION 

U.S. Provisional Patent Application No. 60/264,095 filed 
January 25, 2001 is incorporated herein by reference. 

A method for scheduling the transmission of data units from 
one or more output port queues of a network switch is provided 
10 that can be used to manage a large number of data flows through 
the switch. The presently disclosed scheduling technique divides 
an output line card memory into a plurality of time-based queues 
y. to manage the transmission of data flows associated with a 
g plurality of channels in the network. The presently disclosed 
yjl5 technique further employs one or more acceleration bit-strings to 
m„ identify the data unit in the time-based queue having an 
J associated timestamp with the lowest value, and schedules the 

■ identified data unit as the next data unit to be transmitted over 

Q 

£1 the network. 

I p° Fi 9- 1 depicts an illustrative embodiment of a 

p communications system 100 comprising a communications network 102 
!% * that includes at least one node configured to schedule the 

transmission of data units over the network 102, in accordance 
with the present invention. For example, the network 102 may 
25 comprise a packet switched communications network. In the 
illustrated embodiment, the network 102 comprises a plurality of 
nodes 10 6-113 interconnected by a plurality of data transmission 
paths 120. The plurality of nodes 106-113 includes at least one 
ingress node configured to originate a data path and at least one 
30 egress node configured to terminate a data path through the 
network 102. For example, the network 102 may be configured to 
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establish one or more packet flows for a plurality of packets 
transmitted from a source device 104 coupled to the network 102 at 
the ingress node 106 to a destination device 105 coupled to the 
network 102 at the egress node 113. Accordingly, the node 106 may 
be configured as the ingress node and the node 113 may be 
configured as the egress node for transmitting packets from the 
source device 104 to the destination device 105 via a plurality of 
data paths 120 traversing at least a portion of the intermediate 
nodes 107-112. 

For example, each of the nodes 106-113 on the network 102 
may comprise a router or a network switch. Further, each of the 
devices 104-105 may comprise a client, a server, or a gateway to 
another network. Moreover, the network 102 may comprise a Local 
Area Network (LAN), a Wide Area Network (WAN), a global computer 
yjl5 network such as the Internet, or any other network configured to 
| communicably couple the devices 104-105 to one another. 
3 Those of ordinary skill in the art will appreciate that a 

Class of Services (CoS) contract may be formed between an operator 
P of a communications network and a user of the network specifying 
T0O the user's parameters for transmitting data on the network. For 
g example, the user of the network 102 may be a user of the device 
104 coupled to the network 102 at the node 106, and the CoS 
contract may specify that user's bandwidth for transmitting 
packets over the network 102. Accordingly, the user of the device 
104 may transmit packets at or below the data transmission rate(s) 
specified in the CoS contract or in bursts so long as the 
bandwidth requirements of the CoS contract are not exceeded over 
time . 

Fig. 2 depicts an illustrative embodiment of the node 10 6 on 
the communications network 102 (see Fig. 1). Because the nodes 
10 6-113 are similarly configured to transmit packets over the 
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network 102 from the source device 104 to the destination device 
105 via the data paths 120, each of the nodes 107-113 may be 
configured like the node 106, as depicted in Fig. 2. 

In the illustrated embodiment, the node 10 6 comprises a 
5 network switch 200, which includes one or more input ports 1-P 
communicably coupled to the device 104 and one or more output line 
cards 1-Q, each card having one or more output ports communicably 
coupled to respective data paths 120 in the network 102 (see Fig. 
1) . For example, the switch 200 may comprise an IP-centric switch 
10 configured to run a protocol package such as TCP/IP for 
implementing data communications between the nodes 106-113. The 
switch 200 further includes a cross-connect 202 configured to 
y, allow each of the output cards 1-Q to receive digital data in the 
M form of, e.g., packets from any one of the input ports 1-P. For 
fM5 example, the switch 200 may determine the appropriate output 
5i Port(s) for a particular packet by accessing information contained 
g in a header field of the packet. Because the respective output 
„' cards 1-Q may receive packets from any one of the input ports 1-P, 

g each card includes a memory 700 (see Fig. 7) comprising at least 
rlj20 one time-based queue configured to buffer data corresponding to at 
g least one packet flow for a particular class of service. 
Iy Fi< 3- 3 depicts a conceptual representation of a time-based 

queue 300 included in the memory of the output card 1 of the 
network switch 200 (see Fig. 2). It should be understood that 
25 each of the output cards 1-Q of the switch 200 includes a 
respective memory comprising one or more time-based queues like 
the time-based queue 300. In the illustrated embodiment, the 
time-based queue 300 comprises a linear time-indexed array 
configured to buffer packet data from a plurality of packet flows 
30 1-N provided to the queue 300 by the cross-connect 202 (see Fig. 
2) . For example, the plurality of packet flows 1-N may be 
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associated with a respective channel between the device 104 and 
the node 106. 

As shown in Fig. 3, the time-based queue 300 includes a 
plurality of storage locations, each location corresponding to a 
5 respective timeslot within a time window ranging from t=0 to t=T w . 
For example, the time window for a linear time-indexed array 
employed in an IP-centric network switch may have a duration of 
about 500 msecs (i.e., the time window may range from t=0 to about 
t=T w =500 msecs) . It is noted that IP-packets are normally not 
10 held in such arrays for much longer than about 100 msecs, after 
which TCP retransmissions of the packets are likely to occur. 
Each packet provided to the output card 1 by the cross-connect 202 
y, comprises a data structure including a packet buffer, an 
g associated packet descriptor including a pointer to the packet 
U115 buffer, and an associated timestamp value corresponding to some 
m virtual or actual time. For example, the switch 200 may employ an 
S algorithm such as a Weighted-Fair Queuing (WFQ) scheduling 
algorithm or any other suitable algorithm for computing each 
=T respective timestamp value upon arrival of the packet at the 

'"20 switch 200. Further, each timestamp value may have a granularity 

q on the order of a minimum packet size, e.g., 64 or 128 bytes 

us 

In the illustrated embodiment, the network switch 200 (see 
Fig. 2) is configured to insert each packet descriptor into a 
respective timeslot of the time-based queue 300 (see Fig. 3), as 

25 indexed by the packet's associated timestamp value. For example, 
Fig. 3 shows packet descriptor PI being inserted into the timeslot 
at t=T 9 , packet descriptor P2 being inserted into the timeslot at 
t=T 6 , packet descriptor P3 being inserted into the timeslot at 
t=T w - 12 , packet descriptor P4 being inserted into the timeslot at 

30 t=Ti 4 , and packet descriptor P5 being inserted into the timeslot 
at t=T 13 . Further, each respective timeslot has a 1-bit "Used" 
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variable associated therewith, which is asserted by the switch 200 
to indicate whether the respective time slot contains a valid 
packet descriptor. It is understood that alternative techniques 
may be employed to indicate that a respective timeslot contains a 
5 packet descriptor. The switch 200 is further configured for 
periodically extracting the packet descriptor having the lowest 
timestamp value from the time-based queue 300, and transmitting 
the packet associated therewith over the network 102 via the data 
paths 120 (see Fig. 1) . 
10 Fig. 4 depicts an illustrative embodiment of the output card 

1 included in the network switch 200 (see Fig. 2) . In the 
illustrated embodiment, the output card 1 includes the memory 
y, comprising the time-based queue (specifically, the linear time- 

p indexed array 300) , a queue controller 400, and an acceleration 

p 

|f=15 bit-string memory 402. The queue controller 400 is configured to 



01 



control the insertion and extraction of packet descriptors into 
9 and out of the linear time-indexed array 300. Further, the 
acceleration bit-string memory 402 is configured to store one or 
more acceleration bit-strings, which are generated and 



PJ20 subsequently used by the queue controller 400 to identify the 
q packet descriptor in the array 300 with the lowest timestamp 
ill value. 

For example, the network 102 (see Fig. 1) may comprise a 
fiber optic network conforming to the Synchronous Optical NETwork 

25 (SONET) standard. For such a network operating at Optical Carrier 
speeds ranging from, e.g., OC-3 (155.52 Mbps) to OC-12 (622.08 
Mbps) , the queue controller 400 may be configured to use three 
levels of acceleration bit-strings to identify the packet 
descriptor in the linear time-indexed array 300 with the lowest- 

30 valued timestamp. Further, the linear time-indexed array 300 may 
be configured to include 1-4 million entries of, e.g., 16 or 32- 




bits each to store the packet descriptors. It is understood that 
such a fiber optic network is merely exemplary and that the 
presently disclosed scheduling technique may be employed with any 
suitable network configuration. 
5 In the illustrated embodiment, the acceleration bit-string 

memory 402 (see Fig. 4) includes three Random Access Memories 
(RAMs) 1-3 configured to store first, second, and third levels of 
acceleration bit-strings, respectively. For example, the RAM 1 
may comprise a register having a suitable width for storing the 
10 first level of acceleration bit-strings, and the RAMs 2-3 may 
comprise memories of suitable size for storing the second and 
third levels of acceleration bit-strings, respectively. 

An illustrative method of inserting at least one packet 
O descriptor into a respective timeslot of the linear time-indexed 

o 

!fjjl5 array 300 and generating a plurality of acceleration bit-strings 
%l corresponding thereto for storage in the acceleration bit-string 
p memory 402 is represented in pseudo code in Fig. 8, in which A[k] 
denotes an illustrative array A indexed by an integer k, the 
5=3 result being a bit-string having a width equal to that of A' s 
fljo respective entries, B<p> denotes the p th bit of an illustrative 
bit-string B, C||D denotes an illustrative bit-string C 
PJ concatenated with an illustrative bit-string D, C being the 
higher-order bits if the result is interpreted as an integer, E <- 
F denotes an illustrative element E being set to an illustrative 
25 value F, and "1" is a numeric literal value of appropriate bit- 
string length. 

Accordingly, when a packet descriptor P having a timestamp 
value T associated therewith is received at the output card 1 (see 
Figs. 3-4), T is provided to the queue controller 400, which 
30 partitions T into a plurality of sub-fields T lr T 2 , and T 3 
corresponding to the first, second, and third level acceleration 
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bit-strings and a sub-field T A indexing one of several timeslots 
in the array 300. P is then inserted into the array 300, as 
indexed by T. Next, the Ti st bit of a first level acceleration 
bit-string RAMI is asserted, the T 2 nd bit of a second level 
5 acceleration bit-string RAM2 [Ti] is asserted, and the T 3 rd bit of a 
third level acceleration bit-string RAM3[Til|T 2 ] is asserted. It 
is noted that the insertion of P into the linear time-indexed 
array 300 and the assertion of the respective bits in the first, 
second, and third level acceleration bit-strings are controlled by 
10 the queue controller 400 via the Addr, Ctrl, and Data lines (see 
Fig. 4) . 

An illustrative method of priority encoding the first, 
second, and third level acceleration bit-strings stored in the 
O acceleration bit-string memory 402 and extracting the packet 

fM5 descriptor of the next packet having the lowest-valued timestamp 

m 

from the linear time-indexed array 300 for subsequent transmission 
O over the network is represented in pseudo code in Fig. 9, in which 
, 3 the word width of RAM1-RAM3 (see Fig. 4) may be equal to 64 or 

P 128, and PRI (G) denotes priority encoding of an illustrative bit- 

pk 

PJ10 string G. In the presently disclosed embodiment, PRI (G) 

j£ 

p implements a * low-wins" priority encoding technique that returns 
fU the bit index of the least-significant (i.e., lowest-numbered) "1" 
bit in the bit-string G. 

Fig. 5 depicts a block diagram of the queue controller 400 
25 performing the illustrative priority encoding method of Fig. 9 
using the acceleration bit-string memory 4 02 (see Fig. 4) . In the 
illustrated embodiment, the RAM 1 provides the first level 
acceleration bit-string to a Priority Encoder 1, which performs 
priority encoding on the bit-string to obtain xl. Next, xl is 
30 employed as an address to access the second level acceleration 
bit-string RAM2 [xl] from the RAM 2, which provides RAM2[xl] to a 
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Register 2. The Register 2 then provides the second level 
acceleration bit-string RAM2[xl] to a Priority Encoder 2, which 
performs priority encoding on the bit-string to obtain x2 . Next, 
xl and x2 are concatenated and employed as an address to access 
5 the third level acceleration bit-string RAM3[xl||x2] from the RAM 
3, which provides RAM3[xl||x2] to a Register 3. The Register 3 
then provides the third level acceleration bit-string RAM3[xl||x2] 
to a Priority Encoder 3, which performs priority encoding on the 
bit-string to obtain x3. Next, xl, x2, and x3 are concatenated to 
10 form a priority-encoded acceleration bit-string for indexing the 
linear time-indexed array 300 to identify the packet descriptor P 
having the timestamp with the lowest value. 
u The identified packet descriptor P is then extracted from 

O the linear time-indexed array 300 and the corresponding packet is 

fj?15 scheduled as the next packet to be transmitted over the network. 

m 

In the event all of the entries of the array 300 indexed from X to 
3 X+N-l are now marked as unused, the x3 rd bit of the third level 
s acceleration bit-string RAM3[xl||x2] is de-asserted. Further, in 

P the event RAM3[xl| |x2]=0, the x2 nd bit of the second level 
plO acceleration bit-string RAM2[xl] is de-asserted. Moreover, in the 
Sj event RAM2[xl]=0, the xl st bit of the first level acceleration 
TU bit-string RAMI is de-asserted. It is noted that the priority 
encoding of the first, second, and third level acceleration bit- 
strings stored in the acceleration bit-string memory 402 and the 
25 extracting of the packet descriptor with the lowest-valued 
timestamp from the linear time-indexed array 300 are controlled by 
the queue controller 4 00. 

It is further noted that the above-described packet 
descriptor inserting method could potentially insert a plurality 
30 of packet descriptors, each packet descriptor having the same 
timestamp value, into the same timeslot of the array 300. As a 
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result, there may be a "collision" of packet descriptors at that 
timeslot of the array 300. 

For this reason, the bit-string ARRAY [T] is read before 
inserting P into that bit-string location to determine whether the 

5 corresponding Used variable is asserted, thereby indicating that 
the bit-string ARRAY [T] contains a packet descriptor. In the 
presently disclosed embodiment, a linked list of colliding packet 
descriptors is chained to each timeslot of the linear time-indexed 
array 300 to resolve potential collisions of packet descriptors. 

10 In an alternative embodiment, the array 300 may provide a 
plurality of enqueued packets (e.g., 2 or 4) for each timestamp 
value. 

In the event a linked list of packet descriptors is employed 
for collision resolution, the packets may be removed in any order 
€5 since they are from separate data flows if they have the same 
timestamp. It is noted that this timeslot is considered "used" so 
long as the linked list pointed to by the timeslot contains at 
least one packet descriptor. A timeslot is considered "used" so 
long as at least one packet for the timestamp contains a packet 

flgO descriptor. 

J3 

^ In the presently disclosed embodiment, the number space 

fy employed by the timestamp values is circular, which may cause some 
uncertainty as to whether a particular timestamp value T belongs 
to a current time interval "I" (in which I counts increments of 
25 the time window ranging from t=0 to t=T w ) , a past time interval 
"1-1", or a future time interval "1+1". Fig. 6a depicts the 
above-mentioned time intervals 1-1, I, and 1+1. For example, a 
time window greater than T w spanning rectangles A, B, and C may 
represent the range of timestamp values in use at a particular 
30 instant of time. Further, a current virtual time may be indicated 
at T c , which corresponds to the timestamp of the last packet 
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descriptor extracted from the linear time-indexed array 300 (see 
Fig. 3) . As shown in Fig. 6a, the ranges of timestamp values 
represented by the rectangles A and C within the time intervals I- 
1 and 1+1, respectively, get effectively re-mapped over the time 
5 interval I containing the rectangle B. Specifically, the 
timestamp values corresponding to the rectangle A get re-mapped 
late in time interval I, and the timestamp values corresponding to 
the rectangle C get re-mapped early in time interval I . As a 
result, any packets having timestamp values within the ranges 
10 represented by the rectangles A and C may be transmitted over the 
network in the wrong order. 

As shown in Fig. 6b, the range of usable timestamp values 
may alternatively be represented by a rectangle D, which limits 
P the timestamp value range to less than T w . Further, the current 

IJ15 virtual time may be indicated at T c . However, even though the 

En 

range of timestamp values is now less than T w , the range of 
O timestamp values represented by the rectangle E within the time 
g interval 1+1 gets effectively re-mapped over the time interval I 

;=f containing the rectangle D. Specifically, the timestamp values 
ftlo corresponding to the rectangle E get re-mapped early in time 
q interval I. As a result, any packets having timestamp values 
fU within the range represented by the rectangle E may be transmitted 
over the network in the wrong order. 

In the presently disclosed embodiment, the above-described 
25 timestamp value uncertainty is resolved by limiting the range of 
timestamp values to T w /2 . As shown in the top diagram of Fig. 6c, 
ranges of usable timestamp values are represented by rectangles F, 
G, and H (each of which represent a timestamp value range of T w /2 
or less) . Although no re-mapping occurs for the timestamp values 
30 represented by the rectangles F and G, a portion of the range of 
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time st amp values represented by the rectangle H within the time 
interval 1+1 gets re-mapped early in time interval I. 

As further shown in the top diagram of Fig. 6c, when the 
current virtual time T c reaches T w /2 within the interval I (i.e., 
when T c > T w /2, at which point the value of the Most Significant 
Bit (MSB) or the "sign bit" changes from logical 0 to 1), the 
first half of the timestamp number space within each of the 
intervals I and 1+1 is empty. This means that the first half of 
the time interval I can conceptually be employed as the first half 
of the time interval 1+1, as shown in the middle diagram of Fig. 
6c in which the time interval I is effectively shifted forward in 
time by T w /2 to yield the new time interval 1+^ {see also the 
U bottom diagram of Fig. 6c) . As a result, the range of timestamp 
values represented by the rectangle H is now completely within the 



Q 



yf5 time interval I+H, and any re-mapping of these timestamp values is 

5 prevented. 

¥ For example, the above-described shifting of the time 
xnterval I, as shown in Fig. 6c, may be implemented by replacing 

w the fi rst line of pseudo code in the p r i or i ty encoding method of 

fljo Fig. 9 (i.e., "xl <r PRI(RAMl)" ) with the pseudo code 

O // S = Shift State (the sign bit of T c ) 

m if s = i 

then xl <r PRI (SWAP (RAMI) ) + (SIZEOF (RAMI) /2) 

else 

25 xl <- PRI (RAMI) 

and adding the pseudo code 

if ( (S = 0) AND (SIGN ( PRI (RAMI ) ) = 1)) 
then S <r 1 

else if ((S = 1) AND (SIGN (PRI (SWAP (RAMI) ) ) = 0) 
30 then S <r 0 
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□ 



after the last line of pseudo code of Fig. 9 {i.e., RAMKxl> <- 
0), in which SWAP (J) returns Jl | | Jh, Jl being the lower half of an 
illustrative bit-string J and Jh being the upper half of the bit- 
string J, SIZEOF(K) returns the number of bits in an illustrative 
bit-string K, and SIGN (L) returns the sign bit of an illustrative 
bit-string L. 

In the illustrated embodiment, the memory of the output card 
1 of the network switch 200 (see Fig. 2) has a size that is 
sufficient to support the total bandwidth of the output of the 
cross-connect 202 coming into the card, which may provide packets 
from a plurality of packet flows associated with a respective 
channel in the network. In the event the switch 200 receives 
packets from a plurality of packet flows associated with a 
multiplicity of channels, the output card memory is configured to 
P5 include a plurality of time-based queues, in which the number of 
gV queues corresponds to the number of channels handled by the card. 
J™j Further, in this embodiment, the size of each time-based queue is 
n proportional to the fractional amount of the total bandwidth 

fjf capacity used by the corresponding channel. 

f jj° Fig. 7 depicts the memory 700 of the output card 1, which is 

Q configured to include a plurality of time-based queues 702.1- 
Ey 702. M. It is understood that each of the queues 700. 1-700. M 
includes a plurality of storage locations, each location 
corresponding to a respective timeslot within a time window have a 
25 duration of, e.g., 50 msecs for an IP-centric switch. Further, 
the packet descriptor inserting method of Fig. 8 and priority 
encoding method of Fig. 9 may be employed to manage the insertion 
and extraction of packets into and out of the respective queues 
700. 1-700. M on a per queue basis. It should be appreciated that 
30 the configuration of the output card memory 7 00, as shown in Fig. 
7, is merely exemplary and that the memory 700 may include any 
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suitable number of queues conforming to the same or different 

bandwidth requirements . 

The presently disclosed embodiment of the output card memory 

700 including the plurality of queues 700. 1-700. M will be better 
5 understood with reference to the following illustrative example. 

In this example, each of the queues 700.1-700.11 (M=ll) is 

configured to provide 50 msecs of storage for packets according to 

the bandwidth requirements of the corresponding channel. 

Specifically, the queue 700.1 is configured to provide 50 msecs of 
10 storage for packets from a corresponding channel conforming to the 

bandwidth requirements of OC-24. Similarly, the queues 700.2- 

700.11 are configured to provide 50 msecs of storage for packets 
- from corresponding channels conforming to bandwidth requirements 
O ranging from OC-12 to T-l, respectively. It is noted that such a 
UJ5 configuration of the output card memory 700 may be employed in a 
; |j network switch that supports a total aggregate bandwidth of OC-48. 
P Further, the RAMs 1-3 of the acceleration bit-string memory 

402 (see Fig. 4) are configured as a 1 word x 128 bit register, a 

D 

P 128 word x 128 bit memory, and a 16,000 word x 16 bit memory, 
Rfco respectively, and the linear time-indexed array 300 is configured 
p as a 256,000 word x 16 bit queue. When a packet descriptor is 
fW inserted into a respective timeslot of the queues 700.1-700.11, 
the addresses of the respective "Used" bits asserted in the RAMs 
1-3 and the memory 700 are determined using 
25 used_bit_addr [21:0] = 

{ { ,timestamp[21:0] } & { 22 { ncb==4 ' hO } } 

I {channel[10 ] , times tamp [20 : 0 ] }&{22{ncb==4'hl} } 
I {channel [10: 9] , times tamp [19 : 0] }&{22{ncb==4'h2} } 
I {channel [10:8] , timestamp [ 18 : 0] }&{22{ncb==4'h3} } 
. 30 I {channel [10:7] , timestamp [17 : 0] }&{22{ncb==4'h4} } 

I {channel [10: 6] , timestamp [16 : 0] } & { 22 { ncb==4 ' h5 } } 




I {channel [10: 5] , timestamp [15 : 0] }&{22{ncb==4'h6} } 
I {channel [10:4] , timestamp [14 : 0] } &{22{ncb==4' hi } } 
I {channel [10:3] , timestamp [13 : 0] } & { 22 { ncb==4 ' h8 } } 
I {channel [10:2] , timestamp [12 : 0] }&{22{ncb==4'h9} } 
5 I { channel [10 : 1] , timestamp [11:0] }&{22{ncb==4'hA} } 

I {channel [10:0] , timestamp [10: 0] } &{22{ncb==4'hB} } 

}; 

in which "|" denotes the logical OR operation, denotes the 

logical AND operation, "{...}" denotes a vector, "channel[10 ]- 
10 channel [10: 0]" correspond to the queues 700.1-700.11, "neb" is a 
number-of-channel-bits indication used to control the division of 
the output card memory 700 into the plurality of queues 700.1- 

y, 700.11, and "4'hO-4'hB" denote respective 4-bit hexadecimal 

J? numbers . 

JjJ 5 Moreover, acceleration bit-strings are stored in and 

§i recovered from the RAMs 1-3 and the memory 700 according to the 
^ following TABLE: 

h 

U TABLE 

Itto 



Q 
flj 



neb 


ntsb 


RAMI 


RAM2 


RAM3 


MEM700 


bandwidth 


0 


22 


7 


7 


4 


4 


OC-48 


1 


21 


6 


7 


4 


4 


OC-24 


2 


20 


5 


7 


4 


4 


OC-12 


3 


19 


4 


7 


4 


4 


OC-6 


4 


18 


3 


7 


4 


4 


OC-3 


5 


17 


2 


7 


4 


4 




6 


16 


1 


7 


4 


4 


DS-3 


7 


15 


0 


7 


4 


4 




8 


14 


0 


6 


4 


4 




9 


13 


0 


5 


4 


4 




10 


12 


0 


4 


4 


4 
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in which "ntsb" is a number-of-timestamp-bits indication, and the 
numbers listed under RAMI, RAM2, RAM3, and MEM700 denote the 
5 number of bits stored in the respective memories contributing to 
the formation of the priority-encoded acceleration bit-string used 
for indexing the queues to identify the packet descriptor P having 
the timestamp with the lowest value. It is noted that for the 
total aggregate bandwidth of OC-4 8, in which ncb=0 and ntsb=22, 

10 there are 4+ million (i.e., 2 22 ) available timestamp values. In 
contrast, for the slower speed T-l bandwidth, in which 
ncb=ntsb=ll, there are 2,048 (i.e., 2 U ) available timestamp 

s, values. 

i s In this example, an exemplary queue corresponding to 

M'5 channel[10:6]=ll'bl01_0100_0000 is included in the memory 700, in 

IP 

m which "ll'bl01_0100_0000" denotes an 11-bit binary number, ncb=5, 
and ntsb=17. When a packet descriptor is inserted into a 
respective timeslot of this queue, a first bit is asserted in the 
RAM 1 at us ed_bit_addr [21:15] , a second bit is asserted in the RAM 
Ifeo 2 at used_bit_addr [21:8] , a third bit is asserted in the RAM 3 at 
g used_bit_addr [21:4] , and a fourth bit is asserted in the memory 
y 700 at used_bit_addr [21:0] , in which used_bit_addr [21 : 0] 
{channel [10: 6] , timestamp [16: 0] } . 

When a packet descriptor is to be extracted from this queue, 
25 the 22 nd group of 4 bits is identified in the RAM 1 (because the 
five left-most bits of channel [10 : 6] are "10101" binary, which is 
21 decimal) . Next, priority encoding is performed on these four 
bits to obtain the 2-bit contribution of the RAM 1 to the 
formation of the priority-encoded timestamp [16: 0] . This 2-bit 
30 contribution is denoted as timestamp [16: 15] . Priority encoding is 
then performed on the 128-bit word stored at the 7-bit address 
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{channel [10: 7] , time stamp [16: 15] } of the RAM 2 to obtain the 7-bit 
contribution of the RAM 2 to the formation of the timestamp [16 : 0] , 
which is denoted timestamp [14 : 8] . Next, priority encoding is 
performed on the 16-bit word stored at the 14-bit address 
5 { channel [10: 7] , timestamp [16:8] } of the RAM 3 to obtain the 4-bit 
contribution of the RAM 3 to the formation of the timestamp [16 : 0] , 
which is denoted timestamp [7 : 4 ] . Priority encoding is then 
performed on the 16-bit word stored at the 18-bit address 
{channel [10: 7] , timestamp [16:4] } of the memory 700 to obtain the 4- 
10 bit contribution of the memory 700 to the formation of the 
timestamp [16 : 0] , which is denoted timestamp [3 : 0] . Next, the bits 
denoted as timestamp [16: 15] , timestamp [14 : 8] , timestamp [7 : 4] , and 
U timestamp [3:0] are concatenated to form the timestamp [16 : 0] , which 
g is then used with the channel [10 : 6] in extracting the packet 
L S 5 descriptor from the memory 7 00 having timestamp with the lowest 
jjj value. The packet associated with that packet descriptor is then 
tl transmitted over the network. 

m 

n zt Wl11 further be appreciated by those of ordinary skill in 

g the art that modifications to and variations of the above- 
described technique for scheduling the transmission of packets 
□ from a multiplicity of time-based queues may be made without 
fU departing from the inventive concepts disclosed herein. 
Accordingly, the invention should not be viewed as limited except 
as by the scope and spirit of the appended claims. 
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